Clustering Relational Data Based on Randomized Propositionalization
نویسندگان
چکیده
Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multi-relational data. We describe how random rules are generated and then turned into boolean-valued features. Clustering generally is not straightforward to evaluate, but preliminary experimental results on a number of standard ILP datasets show promising results. Clusters generated without class information usually agree well with the true class labels of cluster members, i.e. class distributions inside clusters generally differ significantly from the global class distributions. The two-tiered algorithm described shows good scalability due to the randomized nature of the first step and the availability of efficient propositional clustering algorithms for the second step.
منابع مشابه
Propositionalization of Relational Learning: An Information Extraction Case Study
This paper develops a new propositionalization approach for relational learning which allows for efficient representation and learning of relational information using propositional means. We develop a relational representation language, along with a relation generation function that produces features in this language in a data driven way; together, these allow efficient representation of the re...
متن کاملApproximate Relational Reasoning by Stochastic Propositionalization
For many real-world applications it is important to choose the right representation language. While the setting of First Order Logic (FOL) is the most suitable one to model the multi-relational data of real and complex domains, on the other hand it puts the question of the computational complexity of the knowledge induction process. A way of tackling the complexity of such real domains, in whic...
متن کاملOn propositionalization for knowledge discovery in relational databases
Propositionalization is a process that leads from relational data and background knowledge to a single-table representation thereof, which serves as the input to widespread systems for knowledge discovery in databases. Systems for propositionalization thus support the analyst during the usually costly phase of data preparation for data mining. Such systems have been applied for more than 15 yea...
متن کاملPropositionalization Through Relational Association Rules Mining
In this paper we propose a novel (multi-)relational classification framework based on propositionalization. Propositionalization makes use of discovered relational association rules and permits to significantly reduce feature space through a feature reduction algorithm. The method is implemented in a Data Mining system tightly integrated with a relational database. It performs the classificatio...
متن کاملEnsemble Relational Learning based on Selective Propositionalization
Dealing with structured data needs the use of expressive representation formalisms that, however, puts the problem to deal with the computational complexity of the machine learning process. Furthermore, real world domains require tools able to manage their typical uncertainty. Many statistical relational learning approaches try to deal with these problems by combining the construction of releva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007